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Abstract 

We develope a query answering system, where at the core of the 
work there is an idea of query answering by rewriting. For this purpose 
we extend the DL DL-Lite [5] with the ability to support n-ary rela- 
tions, obtaining the DL DLR-Lite, which is still polynomial in the size 
of the data [3,4]. We devise a flexible way of mapping the conceptual 
level to the relational level, which provides the users an SQL-like query 
language over the conceptual schema. The rewriting technique adds 
value to conventional query answering techniques, allowing to formu- 
late more simple queries, with the ability to infer additional informa- 
tion that was not stated explicitly in the user query. The formalization 
of the conceptual schema and the developed reasoning technique al- 
low checking for consistency between the database and the conceptual 
schema, thus improving the trustiness of the information system. 



1 Introduction 

The research we are currently carrying out is aimed at the development of 
a query answering system that enables users to pose queries over the con- 
ceptual schema of a database. Such a system provides added value against 
conventional DBMSs, where the users are exposed the relational schema 
only. At the core of our work there is an idea of query answering by rewrit- 
ing. 

In general, query answering by rewriting is divided into two phases. The 
first one re-expresses a user query posed over the conceptual schema in 
terms of the relations at the underlying database, and the second evaluates 
the rewriting over the underlying database (e.g.,[l]). 

Our approach uses a formalism based on Description Logics (DLs) [2] 
to formalize the conceptual schema of the database. Specifically, we have 
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extended the DL DL-Lite [5] with the ability to support n-ary relations, 
obtaining the DL DLR-Lite. Such a formalism is expressive enough to cap- 
ture basic Entity- Relationship or UML Class diagrams, while allowing query 
answering that fully takes into account the constraints in the conceptual 
schema and is still tractable (i.e., polynomial) in the size of the data [3,4]. 

We have devised a flexible way of mapping the conceptual level to the 
underlying relational level, which provides the users an SQL-like query lan- 
guage over the conceptual schema. Queries at the conceptual level are first 
translated into the relational level queries by taking into account the map- 
ping of entities and relationships to the actual database relations. To provide 
a complete answer to the query, the system then uses the developed query 
rewriting technique to take into account the constraints expressed in the 
conceptual schema. The initial user query is thus translated to a set of SQL 
queries that are evaluated by the DBMS. 

This rewriting technique adds value to conventional query answering 
techniques. Firstly, the user is allowed to formulate more simple queries us- 
ing terms defined in the conceptual schema only, without taking into account 
some relational database related details (e.g., join attributes). Moreover, the 
query rewriting technique allows one to infer additional information that was 
not stated explicitly in the user query but is implied by the constraints at 
the conceptual level. Last but not least, the formalization of the conceptual 
schema and the developed reasoning technique allow checking the consis- 
tency of the underlying database against the conceptual schema, therefore, 
the trustiness of the information system is improved. 

2 Formal Framework 

DLR-DB system is a triple S = (1C,7Z,A4), where K, is the knowledge base 
(KB) of S, 1Z is a relational schema for S and M. is the mapping between 
the KB K, and the relational schema 1Z. 

2.1 Conceptual Level 

We call our description logic language DLR-Lite, that allows to represent 
the domain of interest in terms of concepts, denoting sets of objects, and 
relationships, denoting relations between objects. In the language, basic 
concepts are defined as follows: 

B ::= A \ 3[i]R 

where A denotes an atomic concept, R an n-ary relationship, and 1 < i < n. 
Intuitively, 3[i]R denotes the projection of R on the i-th component. Note, 
that all concepts denote unary predicates. 
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For representing intensional knowledge in the KB, we have assertions of 
the form: 

B\ Q B2 (inclusion) 
B\ disj B2 (disjointness) 
(funct 3[i]i?) (functionality) 

An inclusion assertion expresses that a basic concept is subsumed by an- 
other concept, a disjointness assertion states that the set of objects denoted 
by a basic concept B\ is disjoint from the ones denoted by another concept 
B2, while a functionality assertion expresses the (global) functionality of a 
certain component of a relationship. 

The formal meaning of concept descriptions above is given in terms of 
interpretations over a fixed infinite countable domain A. We assume, we 
have one constant for each object, denoting exactly that object. 

An interpretation 1 = (A, x ) consists of a first order structure over A 
with an interpretation function x such that: 

R x C A n 

{3[i]R) x = { c I 3( Cl ,...,c n ) € R 1 , c = c t }. 

An interpretation X satisfies an inclusion assertion B\ C B2 iff B x C 
B x ; 1 satisfies a disjointness assertion B\ disj B2 iff B x n -B x = 0; X 
satisfies a functionality assertion (funct 3[i]i?) if (ci, . . . , Cj, . . . c„) G i? x A 
(c' 1; . . . , Cj, . . . , c' n ) € i?" 1 D ci = c'i, . . . , c n = c' n . 

A model of a KB K, is an interpretation X that satisfies all the assertions 
in K. A KB is satisfiable, if it has at least one model. A KB K, logically 
implies an assertion a if all the models of K, satisfy a. 

All presented assertions allow us to specify the typical constructs used 
in conceptual modeling. Specifically: 

- ISA, using assertions of the form B\ C B 2 , stating that the class B\ 
is a subclass of the class B2; 

- class disjointness, using assertions of the form B\ disj B2, stating 
disjointness between the two classes B\ and B% 

- role-typing, using assertions of the form 3[i]R C B, stating that the 
i-th component of the relationship R is of type B; 

- participation constraints, using assertions of the form B C 3[z]i?, stat- 
ing that instances of class B participate to the relationship R as the 
i-th component; 



3 



- non-participation constraints, using assertions of the form 
B disj 3[i]R, stating that instances of class B do not partici- 
pate to the relationship R as the i-th component; 

- functionality restrictions, using assertions of the form (funct 
stating that an object can be the i-th component of the relationship 
R at most once. 

Example 1 Consider atomic concepts Student, Professor and Course, the 
relationships Attends between Student and Course, Teaches between Pro- 
fessor and Course, and HasTutor between Student and Professor. We can 
now define the following inclusion, disjointness and functionality assertions: 

(A x ) 3[l]Attends C Student (A 8 ) Student C 3[l]Attends 

(A 2 ) 3[2}Attends C Course (j4g) Student Q 3[l]HasTutor 



(A 3 ) 3[l]Teaches C Course 
(A 4 ) 3[2}Teaches C Professor 
(A5) Professor □ 3[2]Teac/ies 
(A) 3[l]tfa S Tutor C indent (^12) ( funct 3[l]tfa S Tutor) 



(Ai ) Course C 3[2]Attera(f,s 
(An) Course □ 3[l]Teac/ies 



(A 7 ) 3[2]#asTufor C Professor (A 13 ) (funct 3[l]Teac/ies) 

where Ai states that everyone attending a course must be a student, 
while A2 states that all attended courses has to be only those that are 
offered in general, etc. A12 states that a student can have only one tutor, 
and A13 states that a course can be tought by only one professor. 

We denote by Normalize(ZC) the DLR-Lite KB obtained by transforming 
the KB /C as follows. The KB /C is expanded by computing all disjoint 
inclusions between basic concepts implied by K,. More precisely, the /C is 
closed with respect to the following inference rule: if B\ C B2 occurs in K, 
and either B2 disj B3 or B3 disj B2 occurs in JC, then add B\ disj B 3 to 
JC. 

It is immediate to see that, for every DLR-Lite KB /C, Normalize(ZC) is 
equivalent to /C, in the sense that the set of models of K, coincides with that 
of Normal ize(/C). 

Given a DLR-DB system S = (JC,U,M), Normalize^) = (K, n ,U,M), 
where K n = Normalize(/C). 

2.2 Relational Level 

At the relational level we consider relations, where each relation has an 
associated sequence of typed attributes. Each relation may have a sequence 
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of one or more components, where each component is a sequence of attributes 
of the relation. Components may not overlap. We call attributes that do not 
belong to any component, additional attributes of the relation. Note, that 
the order of components and the order of attributes may not necessarily be 
related to each other. 

2.3 Mapping from Conceptual to Relational Level 

We can now define the mapping M. between conceptual and logical level as 
follows: 

• to each atomic concept A, M associates a relation M{A) with a single 
component; 

• to each n-ary relationship R, M associates a relation M{R) with n 
components. 

The mapping induces a signature on basic concepts, and specifically 

• for an atomic concept A, the signature is the sequence of types of 
attributes of the component of the relation corresponding to A. 

• for a concept of the form 3[i]R, the signature is the sequence of types 
of the i-th component of the relation corresponding to R. 

A mapping M. is consistent with the conceptual level /C and the relational 
level 1Z of a system S = (!C,TZ,M), if for each inclusion assertion B\ C B2 
in /C, the signature of B\ is equal to the signature of B2. Note that for 
disjointness assertions B\ disj B2, we do not require B\ and B2 to have 
the same signature. Indeed, if B\ and B2 have different signatures, the 
disjointness assertions will trivially be satisfied at the relational level. In 
the following, we will always assume that in a system S = ()C,1Z,A4), the 
mapping M. is consistent with K. and 1Z. 



Example 1 (contd.) In the table below for all atomic concepts and 
relationships the mapping associates the corresponding relations with 
components (underlined) and additional attributes. 



Concept/ 


Relation 


Relationship 




Student 


Student Table(SName, SSurname, EnrollNumber) 


Course 


CourseTablefCourseld, Name, Category) 


Professor 


Professor Table (PName, PSurname, Degree) 


Attends 


AttendsTable (SName, SSurname, Courseld, Year) 


Teaches 


TeachesTable (PName, PSurname, Courseld, Semester) 


HasTutor 


HasTutor Table (SName, SSurname, PName, PSurname) 
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2.4 Semantics of a System S 

In order to define the semantics of a system S = (/C, 1Z, M), we first extend 
the mapping M. to a mapping M c from basic concepts to components of 
relations as follows: 

• for an atomic concept A, let A be the sequence of attributes corre- 
sponding to the only component of M(A). Then M C (A) = tt^(M(A)); 

• for a relationship R, let A be the sequence of attributes corresponding 
to the i-th component of M{R). Then M c (3[i]R) = tt a (M(R)). 

A database instance (or simply database) T> over the relational schema 
1Z is the set of facts of the form R(c), where R is a relation of arity n in 1Z 
and c is an n-tuple of constants of A. A database V satisfies w.r.t. S 

• an inclusion assertion B x C B 2 , if {M C {B 1 )) V C (M C (B 2 )) V ; 

• a disjointness assertion B ± disj 5 2 , if (M c (Bi)) v n (M C (B 2 )) V = 

• a functionality assertion (funct 3[i]i2), if the cardinality of 
(M c (3[i]R))' D is equal to the cardinality of (M(R)) V . In other words, 
the set of attributes of the i-th component of R is a key of R v . 

A database V is said to be consistent w.r.t. a system 5 = (JC, 1Z, M), if it 
satisfies w.r.t. S all assertions in IC. A database T> is said to be df-consistent 
w.r.t. S, if it satisfies w.r.t. S all disjointness and functionality assertions in 
K. 

3 Queries 

3.1 Queries over Conceptual Level 

Queries over a DLR-DB system S = ()C,TZ,A4) are specified using an SQL- 
like syntax corresponding to SPJ queries. More precisely, such a query is 
written in the form: 

SELECT {attribute .specifications) 
FROM {relationship specifications) 
WHERE {selection-conditions) 

where 

• {relationship specifications) denotes the concepts and relationships in- 
volved in the query and the way they join together. It is defined as 
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follows: 

{relationship specifications) ::= {relspec) \ {relationship specifications) , {relspec) 

{relspec) ::= {join) ON {conditions) 

{join) ::= {relationship) \ {join) JOIN {relationship) 

{relationship) ::= d AS Vi 

{conditions) ::= {equality) \ {conditions) AND {equality) 
{equality) ::= ei = ej 

Intuitively, {relationship specifications) is a sequence of expressions of 
one of the following forms: 

- C AS V 

- Ci AS Vi JOIN C 2 AS y 2 JOIN ••• JOIN C k AS Vfc 

ON ei = e 2 AND • • • AND en-i = eh 

where 

— each Cj denotes the name of a relationship or an atomic concept 
in /C; 

— each Vj is a unique variable name, associated to Cj0; 

— in the equalities = e^, each ej or ej is either 

* V, if V is a variable corresponding to an atomic concept; 

* V.i, if V is a variable corresponding to a relationship of arity 

n > i, 

— the signatures of the two associated concepts/relationships com- 
ponents must be the same. 

• {attribute specifications) is a sequence of attributes of the form V.a, 
where V is a variable in {relationship specifications) , associated to 
concept or relationship C, and a is an attribute of relation A4(C); 

• {selection .conditions) is a set of equalities, each of one of the following 
forms: 

- V 1 .a 1 = V 2 .a 2 , 

- Vi.ai = c, 

where Vi is a variables in {relationship specifications) , associated to 
concept or relationships C, is an attribute of relation A4(C), and c 
is a constant. 

Note that relationships and atomic concepts may be repeated. 
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3.2 Conjunctive Queries over the Relational Level 



In this section we first recall the notion of a conjunctive query (CQ). After- 
wards we present how a CQ over the relational level can be obtained from 
a query over the conceptual level. 

3.2.1 Conjunctive Queries 

A term is either a variable or a constant. An atom is an expression 
p(z\, ... ,z n ), where p is a predicate (relation) of arity n and z\, . . . , z n are 
terms. A conjunctive query q over a knowledge base K, is an expression of 
the form 

q(x) <- 3y.conj(x,y) 

where x are the so-called distinguished variables, y are existentially quan- 
tified variables called non- distinguished variables, and conj(x,y) is a con- 
junction of atoms of the form T(z\, . . . , z n ), where T is a relation of 1Z 
with n attributes and z\, . . . , z n are terms. q(x) is called the head of q and 
3y.conj(x,y) the body of q. 

The answer of a query q(x) <— 3y.conj(x,y) over a database T> is the 
set q v of tuples c of constants in a domain A such that when we substitute 
the variables x with the constants c, the formula 3y.conj(x,y) evaluates to 
true in T>. 

A union of conjunctive queries (UCQ) is an expression 

q(x) <- 3y 1 .conj 1 (x, y[) V • • • V 3y^ l .conj m (x, yZi) 

where for each i G {1, . . . , m} conji(x, yl) is a conjunction of atoms. 

The answer of a UCQ q(x) <- 3y[.conji(x, yl) V ■ ■ ■ V 3y^ n .conj m {x , y^) 
over a database P is the union of the answers of the conjunctive queries 

qi (x) <- 3yl . con ji (x , yl ) 

q m (x) <- 3yZ l .conj m (x,y7n) 

3.2.2 Converting Conceptual Queries to Conjunctive Queries 

Given a DLR-DB system 5 = (fC,TZ, M), the conversion of a query g over 
the conceptual level into a conjunctive query is done in two steps: 

1. the query q is converted into a standard SQL select-project-join query 
q' over the relational schema 1Z; 

2. q' is converted into a conjunctive queries using the standard transla- 
tion. 
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In this conversion, the order of attributes of a relation R, specified at the 
relational level, is preserved in the atoms for R. 

In order to convert our conceptual queries to standard SQL queries, first 
each relationship Cj is substituted with M(Cj). For each equality e\ = e2 
in the conceptual query, we substitute it with the conjunction of equalities 
between the attributes corresponding to the components mentioned in e\ 
and ei- 

Example 1 (contd.) Suppose we want to know the surnames of all stu- 
dents that attend the course with ID "AB23INF". We formulate the con- 
ceptual query as follows: 

SELECT S. Surname 

FROM Student AS S JOIN Attends AS A ON S = A.l 
WHERE A. Course = "AB23INF" 

After the rewriting we get the following SQL query: 

SELECT S. Surname 

FROM StudentTable AS S JOIN AttendsTable AS A 

ON S.Name = A. Name AND S. Surname = A. Surname 
WHERE A. Course = "AB23INF" 

Given a query q over S, we denote with CQ(q,S) the conjunctive query 
over 1Z resulting from the above conversion. 

In order to evaluate, using a relational DBMS, the queries we get from 
the rewriting procedure, we need to convert them back to SQL. In doing so, 
we again make use of the order of attributes specified at the relational level. 
We denote the conversion of a CQ q to SQL with SQL(q,lZ). 

3.3 Reasoning in DLR-DB system S 

Given a DLR-DB system S = (JC,TZ,M), a conceptual query q over S and 
a database V over 7Z, the certain answers ans(q,S,T>) is the set of tuples c 
of constants of A, such that c G q$ for every database V that includes V 
and is consistent with S. 

The basic reasoning services over a DLR-DB system S = (/C, 7Z, M) are: 

• KB satisfiability: verify whether a KB is satisfiable. 

• query answering: given a DLR-DB system S = {IC,TZ,M), a con- 
ceptual query q over S and a database V over 71, return the certain 
answers ans((/, S, £>). 

• query rewriting: given a DLR-DB system S = {1C,1Z,M), and a 
conceptual query q over S, return a query q r over 1Z, such that 
q^ = ans(q,S,V) for every database V that is df-consistent with 
Normalize(iS). 
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4 Query Rewriting in System S 



In this section we present an algorithm that computes the perfect rewriting 
of a UCQ. Before proceeding, we address some preliminary issues. 

df-consistency of V w.r.t. S The algorithm Consistent takes as input 
a normalized KB K, and verifies the following conditions: 

- if there exists a disjunction assertion B\ disj B 2 , such that 
(Mc(B l )) v n(Mc(B 2 ))' D ^% 

- if there exists a functionality assertion (f unct 3 [«]!?), such that the car- 
dinality of (Mc{^[i]R)) V is not equal to the cardinality of (M(R)) V . 

Informally, the first condition corresponds to checking whether T> explic- 
itly contradicts some disjunction assertion in /C, and the second condition 
corresponds to check whether T> violates some functionality assertion in /C. 
If at least one of the above conditions holds, then the algorithm returns 
false, i.e., V is not fk-consistent w.r.t. S. Otherwise, the algorithm returns 
true. 

4.1 Rewriting 

The basic idea of the method used is to reformulate the query taking into 
account the KB /C [4]: in particular, given a query q over the conceptual 
schema /C, we compile the assertions of the KB into the query itself, thus 
obtaining a new query q' . Such a new query is then evaluated over the 
database instance T>. 

We say that an argument of an atom in a query is bound if it corresponds 
to either a distinguished variable or a shared variable, i.e., a variable occur- 
ring at least twice in the query body, or a constant, while we say that it is 
unbound if it corresponds to a non-distinguished non-shared variable. 

Definition 4.1 We indicate with gr(g, I) the atom obtained from the atom 
g by applying the inclusion assertion I as follows: 

an inclusion assertion B C A (resp. B C 3[i]i?,) is applicable to an atom 
T(xi, . . . ,x n ) if 

(i) M{A) = T (resp. M{R) = T) 

(ii) all variables among x±, . . . ,x n that are in positions of T that are not 
part of the only (resp. the i — th) component of T are unbound. 

For g = T(xi, . . . , x n ), gr(g, A\ C A2) is the atom T'(x' l7 . . . , x' n ), where 
• T' = M(A 1 ), T = M(A 2 ); 



10 



• the variables in T'(xi, . . . , x' n ) that correspond to the only component 
of T' are equal to the ones that correspond to the only component of 
T\ 

• the remaining variables in T'(x[, . . . , x' n ) are fresh. 

Definition 4.2 Given an atom g\ = r(Xi,...,X n ) and an atom 
92 = r(Y\, . . . , Y n ), we say that g\ and g 2 unify if there exists a variable 
substitution 6 such that 0(g±) = #(52)- Each such a 8 is called unifier. 
Moreover, if gi and g 2 unify, we denote as mgu(gi, 52) a most general unifier 
of gi and g 2 . 

We are now ready to define the algorithm Rewrite. 

Algorithm At first, SQL query is translated to conjunctive query using 
standard SQL-to-CQ algorithm. Then the Rewrite algorithm is applied. 
Note, that the order of the variables, which is the one given by the translation 
from SQL to CQ, must be considered. 

algorithm Rewrite(g, S) 

input: conjunctive query q, DLR-DB system S = (JC,TZ,A4) 
output: union of conjunctive queries P 
P:={q}; 
repeat 
P' := P; 

for each q G P' do 

(a) for each g in q do 
for each I in K, do 

if / is applicable to g 
then P:=PU q[g/gr(g, I)] 

(b) for each g±, g 2 in q do 
if g\ and g 2 unify 

then P := PU {reduce(g, gi, g 2 )}\ 
until P' = P; 
return P 

In the algorithm, q[g/g'] denotes the query obtained from q by replacing 
the atom g with a new atom g' . 

Informally, the algorithm Rewrite first reformulates the atoms of each 
query q G P' and produces a new query for each atom reformulation (step 
(a)) [5] . More precisely, if there exists an inclusion assertion / and a con- 
junctive query q G P' containing an atom g, then the algorithm adds to P' 
the query obtained from q by replacing g with gr(g, I). For the step (b), the 
algorithm Rewrite for each pair of atoms g±,g 2 , that unify, computes the 
query q' = reduce(g, g±, g 2 ), obtained from q by the following algorithm: 
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algorithm reduce (q, gi, g 2 ) 

input: conjunctive query q, atoms (71,52 G body(q) 
output: reduced conjunctive query q' 
q' := q; 

a := mgu(5i,5 2 ) 
body(q') := body(q') - {g 2 } 
q> := a(q') 
return q' 

Informally, the algorithm reduce starts by eliminating g 2 from the query 
body; then the substitution mgu{g\, g 2 ) is applied to the whole query (both 
the head and the body). 

In order to compute the answers of q to S, we need to evaluate the set 
of conjunctive queries P produced by the algorithm Rewrite. Every query 
q in P is transformed into an SQL query. The algorithm Answer, given a 
satisfiable KB K, and a query q, computes the answer to q over IC. Eval(g, V) 
denotes the evaluation of the SQL query q over the database V. 

algorithm Answer(g, S, T>) 

input: conceptual query q, DLR-DB system S = ()C,TZ,M), 

database V for TZ 
output: ans(q,S,V) 
/C:=Normalize(/C); 

return Eval(SQL(Rewrite(C<5(g, S),S),Tt),V) 

5 Conclusions 

In this document we have described DLR-DB, a query answering system 
that enables to pose queries over the conceptual schema of a database, 
re-expressing a conceptual query in terms of relations at the underlying 
database and evaluating the rewriting over the underlying database. We 
have extended the DL DL-Lite to the DL DLR-Lite which supports n-ary 
relations, without loosing nice computational properties of the developed 
reasoning techniques. 

These results are advantageous in formulating more simple queries, using 
terms defined in the conceptual schema only, and infering additional infor- 
mation that was not stated explicitly in the user query but is implied by the 
constraints at the conceptual level. At the same time, the formalization of 
the conceptual schema and the reasoning techniques allow for checking the 
consistency of the underlying database against the conceptual schema. 
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